This article is intended for those who would like to develop their own I2P client from scratch. Familiarity with the basic concepts and concepts of I2P is assumed. At the moment, there is enough documentation and articles on this matter, including those translated into Russian. On the other hand, there is official documentation that describes the protocols and message formats quite well. Unfortunately, it is scattered, with many non-obvious things missing. This article was written primarily on the basis of studying and debugging the official I2P Java client. The ultimate goal is to implement it entirely in C++. The source code of the project in its current state is located on github.
To build your own I2P router, you must have the following encryption algorithms::
The I2P network consists of 4 main layers:
Each layer adds its own encryption for different purposes. Transport layer encryption hides traffic from the provider, tunnels - content and direction from intermediate tunnel nodes, "garlic" - from the final tunnel nodes when transmitting messages between tunnels.
In order to establish a transport layer connection, you need to know the IP address and port. There is a list of known nodes, called netDb, that changes during operation; information about new nodes comes from other nodes. Initially, the list of nodes is downloaded from special sites, the addresses of which are explicitly listed in the file router/networkdb/Reseeder.java. The protocol running on top of TCP/IP is called NTCP, and on top of UDP is called SSU. In addition to some differences in connection setup, SSU, due to its packet nature, supports breaking long messages into several fragments. The transmitted messages consist of a header, an I2PN message (more about the I2NP protocol below) and a checksum. A special message containing the current time is periodically transmitted for synchronization purposes. When a connection is established, the public keys of the routers are exchanged, on the basis of which, using the Diffie-Hellman algorithm, a common key for AES encryption is calculated, each on its own side.
Tunnels are always unidirectional - all messages can only be transmitted from the input node (Gateway) to the output node (Endpoint). Depending on which end of the tunnel belongs to its owner, who has all the information about the tunnel, tunnels are divided into incoming (the owner is the output node) and outgoing (the owner is the input node). The intermediate nodes of the tunnel do not know whether the tunnel is inbound or outbound, the only action carried out by the intermediate node is to encrypt the message with its encryption key and transmit it to the next node. An important consequence follows from this: the sequential decryption of tunnel messages must be carried out by its owner, since only the owner has the encryption keys of all intermediate nodes. This fact is quite trivial for incoming tunnels, i.e., having received a message, the exit node must sequentially decrypt it, however, for outgoing tunnels, the original unencrypted message must be sequentially decrypted before it is sent. Tunnels for which this node is not the owner are called transit tunnels. Transit tunnels carry foreign traffic and are necessary to support the functioning of the entire I2P network, thereby turning the node into a router. Tunnel nodes use AES encryption with three different keys: one is used to encrypt the node's response when creating the tunnel, and the other two are used to transmit data through the tunnel: one key encrypts the data itself, and the other encrypts the initialization vector (IV) to encrypt the data. In this case, the IV is encrypted with the same key twice: before encryption and after, this is called double encryption. The node receives these two keys in its tunnel creation message record, encrypted with its public key using ElGamal.
Inside tunnels, only TunnelData messages are transmitted, generally consisting of several fragments. The TunnelGateway message is used for transmission between tunnels. Although the official documentation says that for a two-way connection you need at least 4 tunnels (2 incoming and 2 outgoing), in fact it is not necessary to send messages through outgoing tunnels, but you can send a TunnelGateway message to the input node of the desired incoming tunnel.
In the TunnelData message, the checksum is calculated from the content data following the null byte and the unencrypted IV appended to it.
Data exchange within the I2P network occurs using I2NP messages of various types. Each message contains a header with its type and length, which allows you to define the boundaries between messages. Depending on the type, the message length can vary from 20 to 64K bytes. Each layer uses “wrapper” messages containing other I2NP messages from a higher layer. For tunnels, such “wrappers” are TunnelData messages for transmission within tunnels and TunnelGateway messages for transmission between tunnels. For “garlic” – Garlic. Most I2P traffic consists of the following nested messages:
Data->Garlic->TunnelData.
As a rule, messages are transmitted through tunnels, although they can also be transmitted directly between routers, in particular for the initial creation of new tunnels. Routers also exchange DatabaseStore messages immediately after establishing a connection. Messages between destinations should be sent via garlic, since the corresponding field is only present there.
To work on an I2P network, you need an I2P client, which consists of a router that provides access to the I2P network and destinations for exchanging meaningful information. Information about routers, including their IP addresses, is publicly available; moreover, the current list of routers can be downloaded from special ftp sites. At the same time, information about the location of destination points is confidential. Information about destination points located on a given router is available only to this router; for all others, obtaining this information is not possible, which is one of the main mechanisms for ensuring anonymity of the I2P network.
Since routers are mainly located on the computers of network participants, their composition changes all the time. Therefore, routers are forced to constantly keep their list of other routers up to date. This process is called "probing" (exploratory), which consists of sending requests with a randomly selected 32-byte address to special routers called floodfill. It is assumed that floodfill routers have complete information about the network. Among other things, floodfill routers constantly communicate to each other information about new nodes found.
To request information about a node, the I2NP DatabaseLookup message is used, and the DatabaseStore information is used to transmit the information itself. Typically, messages are transmitted through tunnels, but the DatabaseStore is transmitted directly by the node at the transport level immediately after the connection is established, thereby informing the network of its existence. Otherwise, building tunnels for new nodes would be impossible.
DatabaseStore can contain two types of information: if this address corresponds to the RouterInfo structure, then the address is a router, and if LeaseSet, then the destination.
RouterInfo contains the public keys of the router, as well as a variety of service information, the most important of which are IP addresses, ports and supported transport protocols for the connection and information about whether the given router is a floodfill or not. Since RouterInfo can contain quite a lot of text information, it is transmitted gzipped.
LeaseSet, contains a list of incoming tunnels for a given destination, as well as a public key for encrypting garlic messages destined for this destination.
Let's consider the meaningful actions of the I2P client: anonymous hosting of online resources, and, accordingly, access to them. First, let's try to get data from some website, for example, Flibusta. At the moment we only have a 32-byte hash of its I2P address, our goal is to send an HTTP request and receive a response.
Of course, there is no router with such an address in the database (otherwise the IP address of the resource would be visible to everyone), so the only way to send a request is some incoming tunnel of the desired node that exists at a given time, for which you must first request and receive a LeaseSet. Unlike RouterInfo, which can be requested and received from a neighbor at the transport layer, LeaseSet can only be requested and received through tunnels that must first be built. This leads to a disappointing conclusion that it will not be possible to use an I2P network “on demand”; the I2P router must be running and must constantly be engaged in building and maintaining tunnels. Due to the decentralized nature of the network, building tunnels is a very difficult task - most attempts to create tunnels end in failure.
To successfully build a tunnel, two conditions are required:
The maximum lifetime of a tunnel is 10 minutes; a tunnel can terminate its existence early if a node participating in the tunnel goes offline. Therefore, tunnel owners constantly send test messages to keep the list of “live” tunnels up to date..
So, the tunnels are available and the necessary LeaseSet is available. Now we can send an HTTP request and it will even reach the recipient, but we would also like to receive a response. To do this, we must indicate our own LeaseSet in our message, then the response will be sent to us through some incoming tunnel and most likely will safely reach our node. Since several connections can operate simultaneously through our node, each of them must either be assigned its own I2P address and formed a LeaseSet of several incoming tunnels, or a “shared” address must be created that multiplexes connections using a special protocol with the corresponding fields, which is a “wrapper” over application layer protocol. This protocol is called I2CP and the official I2P client uses it exclusively, although this is not necessary to build your own services. Of course, to access Flibusta you should use I2CP, since it is what it expects. However, to build, for example, your own torrent-like network, you can only get by with I2P addressing.
The I2CP protocol and the protocol stack built on top of it is a separate topic, which is covered in a separate article. article.
Encryption used
To build your own I2P router, you must have the following encryption algorithms::
- ElGamal. Asymmetric encryption based on raising the base to a modulo power. The base and module are fixed constants for the entire I2P network. In addition to the standard block size of 514 bytes, custom block sizes of 512 bytes are also used.
- Diffie-Hellman to obtain the shared key of a symmetric encryption key by exchanging public keys. The same keys are used as for ElGamal.
- DSA for creating and verifying electronic signatures
- AES in two modes: CBC using an encryption key and initialization vector (IV), ECB for encrypting the IV itself, 16 bytes long
- SHA256 for calculating hashes
- Adler32 to calculate message checksum
Basic protocols
The I2P network consists of 4 main layers:
- Transport layer. These are encrypted Internet connections TCP/IP or UDP. Includes connection establishment and encryption.
- Tunnels. “Windows” of nodes to the outside world, located on other nodes and allowing one to hide their true location. They consist of a sequence of nodes interconnected by transport layer protocols. The tunnel can be simplified to think of as a chain of proxy servers to anonymize both the client and the server.
- «Garlic". Transmission of messages or sequences between two end nodes via arbitrary routes and tunnels. Characterized by session identifiers and asymmetric, and, after establishing a session, symmetric encryption
- Application layer protocols for transferring user data between nodes.
Each layer adds its own encryption for different purposes. Transport layer encryption hides traffic from the provider, tunnels - content and direction from intermediate tunnel nodes, "garlic" - from the final tunnel nodes when transmitting messages between tunnels.
Transport layer
In order to establish a transport layer connection, you need to know the IP address and port. There is a list of known nodes, called netDb, that changes during operation; information about new nodes comes from other nodes. Initially, the list of nodes is downloaded from special sites, the addresses of which are explicitly listed in the file router/networkdb/Reseeder.java. The protocol running on top of TCP/IP is called NTCP, and on top of UDP is called SSU. In addition to some differences in connection setup, SSU, due to its packet nature, supports breaking long messages into several fragments. The transmitted messages consist of a header, an I2PN message (more about the I2NP protocol below) and a checksum. A special message containing the current time is periodically transmitted for synchronization purposes. When a connection is established, the public keys of the routers are exchanged, on the basis of which, using the Diffie-Hellman algorithm, a common key for AES encryption is calculated, each on its own side.
Tunnels
Tunnels are always unidirectional - all messages can only be transmitted from the input node (Gateway) to the output node (Endpoint). Depending on which end of the tunnel belongs to its owner, who has all the information about the tunnel, tunnels are divided into incoming (the owner is the output node) and outgoing (the owner is the input node). The intermediate nodes of the tunnel do not know whether the tunnel is inbound or outbound, the only action carried out by the intermediate node is to encrypt the message with its encryption key and transmit it to the next node. An important consequence follows from this: the sequential decryption of tunnel messages must be carried out by its owner, since only the owner has the encryption keys of all intermediate nodes. This fact is quite trivial for incoming tunnels, i.e., having received a message, the exit node must sequentially decrypt it, however, for outgoing tunnels, the original unencrypted message must be sequentially decrypted before it is sent. Tunnels for which this node is not the owner are called transit tunnels. Transit tunnels carry foreign traffic and are necessary to support the functioning of the entire I2P network, thereby turning the node into a router. Tunnel nodes use AES encryption with three different keys: one is used to encrypt the node's response when creating the tunnel, and the other two are used to transmit data through the tunnel: one key encrypts the data itself, and the other encrypts the initialization vector (IV) to encrypt the data. In this case, the IV is encrypted with the same key twice: before encryption and after, this is called double encryption. The node receives these two keys in its tunnel creation message record, encrypted with its public key using ElGamal.
Inside tunnels, only TunnelData messages are transmitted, generally consisting of several fragments. The TunnelGateway message is used for transmission between tunnels. Although the official documentation says that for a two-way connection you need at least 4 tunnels (2 incoming and 2 outgoing), in fact it is not necessary to send messages through outgoing tunnels, but you can send a TunnelGateway message to the input node of the desired incoming tunnel.
In the TunnelData message, the checksum is calculated from the content data following the null byte and the unencrypted IV appended to it.
I2NP protocol
Data exchange within the I2P network occurs using I2NP messages of various types. Each message contains a header with its type and length, which allows you to define the boundaries between messages. Depending on the type, the message length can vary from 20 to 64K bytes. Each layer uses “wrapper” messages containing other I2NP messages from a higher layer. For tunnels, such “wrappers” are TunnelData messages for transmission within tunnels and TunnelGateway messages for transmission between tunnels. For “garlic” – Garlic. Most I2P traffic consists of the following nested messages:
Data->Garlic->TunnelData.
As a rule, messages are transmitted through tunnels, although they can also be transmitted directly between routers, in particular for the initial creation of new tunnels. Routers also exchange DatabaseStore messages immediately after establishing a connection. Messages between destinations should be sent via garlic, since the corresponding field is only present there.
Routers and destinations)
To work on an I2P network, you need an I2P client, which consists of a router that provides access to the I2P network and destinations for exchanging meaningful information. Information about routers, including their IP addresses, is publicly available; moreover, the current list of routers can be downloaded from special ftp sites. At the same time, information about the location of destination points is confidential. Information about destination points located on a given router is available only to this router; for all others, obtaining this information is not possible, which is one of the main mechanisms for ensuring anonymity of the I2P network.
Since routers are mainly located on the computers of network participants, their composition changes all the time. Therefore, routers are forced to constantly keep their list of other routers up to date. This process is called "probing" (exploratory), which consists of sending requests with a randomly selected 32-byte address to special routers called floodfill. It is assumed that floodfill routers have complete information about the network. Among other things, floodfill routers constantly communicate to each other information about new nodes found.
To request information about a node, the I2NP DatabaseLookup message is used, and the DatabaseStore information is used to transmit the information itself. Typically, messages are transmitted through tunnels, but the DatabaseStore is transmitted directly by the node at the transport level immediately after the connection is established, thereby informing the network of its existence. Otherwise, building tunnels for new nodes would be impossible.
DatabaseStore can contain two types of information: if this address corresponds to the RouterInfo structure, then the address is a router, and if LeaseSet, then the destination.
RouterInfo contains the public keys of the router, as well as a variety of service information, the most important of which are IP addresses, ports and supported transport protocols for the connection and information about whether the given router is a floodfill or not. Since RouterInfo can contain quite a lot of text information, it is transmitted gzipped.
LeaseSet, contains a list of incoming tunnels for a given destination, as well as a public key for encrypting garlic messages destined for this destination.
Application Layer Services
Let's consider the meaningful actions of the I2P client: anonymous hosting of online resources, and, accordingly, access to them. First, let's try to get data from some website, for example, Flibusta. At the moment we only have a 32-byte hash of its I2P address, our goal is to send an HTTP request and receive a response.
Of course, there is no router with such an address in the database (otherwise the IP address of the resource would be visible to everyone), so the only way to send a request is some incoming tunnel of the desired node that exists at a given time, for which you must first request and receive a LeaseSet. Unlike RouterInfo, which can be requested and received from a neighbor at the transport layer, LeaseSet can only be requested and received through tunnels that must first be built. This leads to a disappointing conclusion that it will not be possible to use an I2P network “on demand”; the I2P router must be running and must constantly be engaged in building and maintaining tunnels. Due to the decentralized nature of the network, building tunnels is a very difficult task - most attempts to create tunnels end in failure.
To successfully build a tunnel, two conditions are required:
- All nodes participating in the tunnel must be reachable at the transport layer by at least the previous node in the tunnel
- All nodes participating in the tunnel must agree to build a new tunnel. A node may refuse to create a tunnel, for example, due to its congestion
The maximum lifetime of a tunnel is 10 minutes; a tunnel can terminate its existence early if a node participating in the tunnel goes offline. Therefore, tunnel owners constantly send test messages to keep the list of “live” tunnels up to date..
So, the tunnels are available and the necessary LeaseSet is available. Now we can send an HTTP request and it will even reach the recipient, but we would also like to receive a response. To do this, we must indicate our own LeaseSet in our message, then the response will be sent to us through some incoming tunnel and most likely will safely reach our node. Since several connections can operate simultaneously through our node, each of them must either be assigned its own I2P address and formed a LeaseSet of several incoming tunnels, or a “shared” address must be created that multiplexes connections using a special protocol with the corresponding fields, which is a “wrapper” over application layer protocol. This protocol is called I2CP and the official I2P client uses it exclusively, although this is not necessary to build your own services. Of course, to access Flibusta you should use I2CP, since it is what it expects. However, to build, for example, your own torrent-like network, you can only get by with I2P addressing.
The I2CP protocol and the protocol stack built on top of it is a separate topic, which is covered in a separate article. article.